Adaptive Sampled Softmax with Kernel Based Sampling

نویسندگان

  • Guy Blanc
  • Steffen Rendle
چکیده

Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it too expensive for many real-world problems. A common approach to speed up training involves sampling only some of the classes at each training step. It is known that this method is biased and that the bias increases the more the sampling distribution deviates from the output distribution. Nevertheless, almost any recent work uses simple sampling distributions that require a large sample size to mitigate the bias. In this work, we propose a new class of kernel based sampling methods and develop an efficient sampling algorithm. Kernel based sampling adapts to the model as it is trained, thus resulting in low bias. Kernel based sampling can be easily applied to many models because it relies only on the model’s last hidden layer. We empirically study the trade-off of bias, sampling distribution and sample size and show that kernel based sampling results in low bias with few samples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TAPAS: Two-pass Approximate Adaptive Sampling for Softmax

TAPAS is a novel adaptive sampling method for the softmax model. It uses a two pass sampling strategy where the examples used to approximate the gradient of the partition function are first sampled according to a squashed population distribution and then resampled adaptively using the context and current model. We describe an efficient distributed implementation of TAPAS. We show, on both synth...

متن کامل

Interactive Out-Of-Core Texturing Using Point-Sampled Textures

The visualization of huge 3D objects becomes available on common workstations thanks to highly optimized data-structures and out-of-core frameworks for rendering. However, the editing, and in particular, the texturing of such objects is still a challenging task, since usual methods for optimized rendering are not easily amenable to interactive modification. In this paper, we introduce the idea ...

متن کامل

Interactive Out-Of-Core Texturing with Point-Sampled Textures

The visualization of huge 3D objects becomes available on common workstations thanks to highly optimized data-structures and out-of-core frameworks for rendering. However, the editing, and in particular, the texturing of such objects is still a challenging task, since usual methods for optimized rendering are not easily amenable to interactive modification. In this paper, we introduce the idea ...

متن کامل

Matrix Approximation for Large-scale Learning

Modern learning problems in computer vision, natural language processing, computational biology, and other areas are often based on large data sets of tens of thousands to millions of training instances. However, several standard learning algorithms, such as kernel-based algorithms, e.g., Support Vector Machines, Kernel Ridge Regression, Kernel PCA, do not easily scale to such orders of magnitu...

متن کامل

Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.00527  شماره 

صفحات  -

تاریخ انتشار 2017